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Abstract 


\ 

n/ 

TWO  nothoda  o f  ootl  nation  for  the  parameters  of  the  nultlvarlata  normal 
distribution  baaed  on  ths  saapls  characteristic  function  ars  (Ivan.  Thaos  methods  ara 
shown  to  hava  an  aquivalant  basis  In  tens  of  Parson  kamal-lika  density  estimation. 
Tha  estimators  for  tha  naan  vactor  and  covarlanca  matrix  ara  dapandant  an  a 
user -specified  paranatar.  variation  of  tha  uaar-spaclflad  paraaatar  producas  a 
rasponsa  surfaca  in  tha  paraaatar  astl aatea  and  tharafora  allows  for  an  informal 
sanoltivlty  analysis  of  tha  data  with  raspact  to  a  tantativa  wortdnt  modal.  Tha 
informal  sanoltivlty  analysis  Is  Intricately  rotated  to  formal  testa  of  fit  of  tha  modal. 
Tha  estimators  of  naan  vactor  and  covarlanca  matrix  hava  daolrabla  robustness 
proportion,  ara  assy  to  compute  and  usa,  ara  ralativaly  efficient  at  tha  multivariate 
normal,  and  ara  usaftil  in  identifying  potential  outliers  and  Inconsistencies  in  soma 
statistical  assumptions.  Those  methods  ara  directly  applicable  to  structured  data  such 
as  multivariate  experimental  deal  (no.  Several  Illustrations  ars  provided. 
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1.  Introduction 


This  paper  address—  the  general  problem  a t  using  Maple  characteristic  functions 
to  construct  robust  estimators  of  location  and  covariance  parameters  at  linear  models 
with  a  p- variate  qauselan  error  structure,  we  envision  the  linear  models  as  tentattvs 
working  models  consisting  at  an  error  structure  (additive  Gineslanty)  and  a  functional 
structure  (e.g. ,  a  linear  regression  or  experimental  design  model),  our  procedures 
have  also  been  used  successfully  on  non-linear  models  but  we  do  not  Address 
non-Unear  models  here.  The  working  model  is  regarded  as  a  single  entity.  or 
course.  If  it  Is  certain  that  the  functional  structure  is  linear  and  that  the  error 
structure  Is  additive,  independent  p-varlate  Gaussian  then  there  would  be  little  interest 
in  estimators  derlvsd  from  ths  sample  characteristic  fLnctlon  hseniss  at  ths  reedy 
availability  at  maximum  likelihood  and  leant  squares  estimators.  But  it  Is  precisely 
because  or  uncertainty  concerning  functional  and  error  structure  In  almost  all  practical 
situations  that  alternatives  to  least  squares  and  maximum  likelihood  become  Interesting. 
The  working  model  should  undergo  a  process  of  criticism  (see  Box,  1979,  Daniel, 
1978,  and  Paulson  and  fflcklln,  1993,  for  discussion)  In  order  to  determine  the 
degree  to  which  the  available  data  and  the  tentative ,  working  model  are  mutually 
consistent.  A  component  of  the  process  of  criticism  can  be  based  on  robust 
estimators i  if  robust  estimators  and.  My,  maximum  likelihood  estimators  "agr— "  in 
the  specification  of  a  model  In  the  sense  that  if  estimators  of  unknown  parameters  are 
does,  then  the  data  and  the  model  nay  be  mutually  consistent,  several  difficulties 
associated  with  the  last  sentence  need  to  be  highlighted,  rirst.  Just  as  is  ths  came 
for  tests  at  fit,  a  particular  robust  method  may  not  be  sensitive  to  certain  types  of 
departure  from  a  working  model  and  thus  the  use  at  the  word  »»«  Secondly,  how  can 
"agreement”  between  working  model  and  the  data  be  objectively  aeeeeeed  and  how  is 


do— nest i  of  maximum  likelihood  and  robust  estimates  to  be  objectively  a—  —  a?  Any 
formal  111111  asmsnt  of  agreement  ia  tantamount  to  a  tact  of  fit  of  the  truth  of  tha 
working  modal.  This  Una  of  d  logical  on  strongly  auggasta  an  intimate  connection  of 
taste  of  fit  with  robuat  methods.  Xf  an  objective  aasaasmant  is  not  available,  then 
robust  methods  should  still  be  of  Interest  since  they  generate  an  informal  sensitivity 
analysis  1  different  but  sensible  methods-apart  from  efficiency  considerations  regarding 
the  methods  -  of  extracting  Information  from  sample  data  relative  to  a  given,  tentative 
model  should  not  lead  to  materially  different  suana fixations  or  conclusions  if  model 
and  data  are  mutually  consistent.  It  is  worth  noting  that  In  many  practical  settings 
the  form  of  the  error  structure  of  a  working  model  may  be  Important  only  as  an 
Indication  of  repeatability  or  homogeneity.  Gaussianlty  per  se,  for  example,  may  be 
of  little  or  no  Interest. 

The  estimation  methods  we  develop  for  causslan  working  models  are  Intimately 
related  to  density  estimation,  but  are  quite  different  In  spirit  from  the  work  of  Parsen 
(1962),  and  Watson  and  Geadbetter  (1963).  One  of  the  main  features  of  our  work  is 
the  development  of  an  objective  function  from  which  robust  estimators  of  location  and 
covariance  are  Jointly  determined .  The  estimation  procedures  developed  herein 
compare  favorably  with  those  of  Maronna  (1976),  Devlin  et  al.  (1975),  Campbell 
(i960),  Gnanadeelkan  and  Kettenrlng  (1972)  and  Buber  (1981,  Chapter  8).  The 
recent  books  by  Buber  (1981),  Barnett  and  tewls  (1978),  and  Gnanadeelkan  (1977) 
provide  excellent  reviews  and  discussions  of  a  major  portion  of  the  literature .  some 
of  the  univariate  counterparts  of  this  paper  were  considered  by  Paulson  and  Nlcklln 


on  Minimisation  of  an  Objective  Function 


m  fundamental  papers  Rosenblatt  (1956),  Parzen  (1962),  and  Watson  and 
beadbetter  (1963)  consldsrsd  the  problaa  of  estimating  a  density.  while  the 
approach  taken  In  these  papers  is  nonparametsrlc ,  we  shall  be  concerned  with  density 
estimation  in  a  pari  Metric  fraaework.  We  shall  be  concerned  exclusively  In  this 
section  with  the  problea  of  estlaatlns  the  aean  vector  m  and  the  positive  definite 
covariance  aatrlx  D*(<t1  j  )  of  the  p- variate  normal  (Gaussian)  density 

f(X)  «  f(X|ji,D)  «  |2irD|'%exp(  *%(x-M)TD‘i(x-M)).  (2.1) 

given  a  random  sample  st,  x2,  — ,  xn,  and  given  that  the  normal  model  is  the 
tentative  working  model.  Functional  structure  is  incorporated  subsequently.  We  shall 
say  that  the  xj  have  the  distribution  Mp(m,0)  for  brevity.  we  use  the  expression 
sensitivity  analysis  because  observational  Information  processed  under  the  aegis  of  the 
working  model  by  different  methods  should  not  change  the  tentative  results  very  much  If 
the  working  model  and  the  data  are  mutually  consistent.  If  they  are  not  mutually 
consistent,  then  substantial  differences  nay  result.  It  is  not  possible  to  give  a 
completely  unambiguous  definition  of  sensitivity  for  every  type  of  problem  that  nay  be 
encountered  in  practice.  The  judgment  of  the  application  area  must  always  be 
Incorporated  Into  the  process  of  assessing  sensitivity, 
bet 


♦„(u)  «  n*1 


E  exp(iuTD‘*(Xj-M)) 


(2.2) 


be  a  sample  characteristic  function;  note  that  when  n.  and  D  are  specified 


K('Mu))  *  \|*U)  »  SXp(  -%uTu), 

where  i*  »  -1,  u  is  a  pxl  vector  of  real  numbers,  and  0*^  represents  the  unique 
symmetric  square  root  matrix  o t  0.  When  4  and  0  are  not  known,  «l>n(u)  is  not  a 
statistic.  However,  the  parameters  4  and  0  of  (2.1)  may  be  estimated  by  aaklnt 
en(u)  and  <P(u)  match  up  in  some  sense.  There  are  many  ways  in  which  this  may  be 
done  but  we  present  only  that  which  we  have  found  to  be  moot  theoretically  as  well  as 
practically  useful. 

If  we  define 


Qn(4t0,m)  »  f  ien(u)  -  OK u ) I •*  axp(  -m*uTu)  du 
J  P  - 


(23) 


J  R(u)  R*( u )  du  a  J  I R( u ) 1 1  du. 


i^tere  *  denotes  complex  conjugate,  Rp  represents  p -dimensional  Euclidean  space,  and 
the  residual  4*n(u)  -e(u)  weighted  by  exp(  -ljm2uTu)  is 
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axp<  -*<  l+m*  )uru) 


1  C  exp<  1utD*^(x«  -M)->pBauru). 


(2.4) 


The  expression  (2.4)  represents  a  difference  of  characteristic  functions  whose  inverse 

is 

r(x)  -  g(x)  -  |n(x) 

where  g(x)  is  the  spherical  normal  distribution  with 


g(  x )  ■  (2tr(i+m*))-|iP  exp<  -%( 1+n*  )*x  xTx), 


(2.5) 


and  gn(x)  i*  «n  eatlaator  of  g(x)  with 


«n(*>  *  n‘*(2ina*)*%P  £  exp( -(2a*  r1*!*} ),  (2.6) 

j»x 


*)  ■  X-D**(Xj-ji). 


The  iipwien  gn(x)  la  unblioo d  for  g(x)  when  tho  Xj  an  p-varlate  giumUii. 
Whan  tho  Xj  are  not  Giuoolan,  gn(x)  can  dtffor  substantially  froa  g(x)  slnco  oach  xj 
haa  an  lnfluonco  in  tho  estimate  of  tho  population  donalty.  Note  that  (2.6)  la  not  tho 
uaual  Parson  kernel  donalty  eatlaator.  Zh  fact,  gn(x)  haa  both  paraaetrlc  (being 
dopondont  on  ji  and  D)  and  nonparaaotric  foaturoo. 

By  tho  anlMrtlaonolonal  voralon  of  Paroovol'a  thooroa  (roller,  1966,  Chaptor  15) 


Qn(p,D,a)  ■  f  |R(u)|*  du  »  (2 ir)P  f  r*(x)  dx. 

j  n_ 


(2.7] 


Batlaatoro  for  m  and  D,  tontativo  on  tho  correctness  of  tho  aodol  (2.1),  aro 
given  by  alnlalslng  Qn(/i,D,«)  over  n  and  D  for  a  apodfiod  value  of  a,  Ookod. 
Explicit  integration  of  (2.3)  yields 


Qn(M.D.a)  ■  ir%P  {n_1a‘P  ^  esp[-  QJkj 


-  2( %+•* )"%P  £  oxp(-  2(1^a|I)  Qj]  *  ndfa*)-^}  ,  (2.8) 


Qj  ■  (Xj-jO'D'^Xj-h), 


Qjk  »  (XJ  -x„  )r(2D)*1(xJ  -X*  ) 


(2.9) 


(2.10) 


*4  u’  9D  * 


we  find  that  the  estimators  of  m  and  0,  m(»),  0(a),  say,  satisfy  the  implicit 


equations 


M  *  C  wJ(a)xj 
J*x  J  J 


(2.11) 


0  *  k(a,n)BA*tD  , 


(2.12) 


where 


A  »  C(*j-M)(*j  *M) r**p£  *(  4a*+2  )"l( Xj  -M)rD*1(XJ  *M)J, 


8  »  CE(*j-*k)(*j-**c)T«*P£-(<*'*>‘x(*)-*k)TD*1'(Xj-xk)], 


Wj(a)  ■  aJ(a)/a.(m), 


»j(«)  »  e*p£  *(  4aa+2  )* l( Xj  -M)r0*l(Xj  *m )  J  * 


bJk(a)  *  expC-^Xj-Xfc  )T(2aiD)*i(xJ-xk  )J, 


a. (a)  »  £  a<(a), 
)*l 


b.  .(a)  »  E  C  *>JK(m), 

j4k  1 


*<".")  -  ~  (1  *  ^  • 


In  (2.12)  EC  runs  over  all  j  and  k  and  E  runs  over  all),  ) ,  k  a  l ,  2,...,n.  A 


convenient  way  of  interpreting  (2.12)  Is  that,  approximately ,  o  is  determined  in  such 


a  way  that  the  product  of  one  estimate  of  the  covariance  matrix,  B,  say,  with  the 


inverse  of  another  estimate  of  the  covariance  matrix ,  A,  say,  is  apart  froa  constants. 


likelihood 


the  identity  eetrix.  Note  that  m(®)  ■  m  »  n'1  c  *i  and  6(00)  *  o 

„  J*1 

n'1  £  (*i  *  x)(x<  -  x)T,  the  usual  method  of  moments  and  maximum 
J»1  J  J 

estimates  for  4  and  0  respectively . 

An  estimator  for  D  based  on  A  in  the  absence  of  B  may  be  developed  In  several 
ways.  First,  the  expectation  of  A  is 


*(A) 


fl+2m*1*( P+*> 

n[i«S*J 


D 


so  that  estimators  for  4  and  D,  say  m*(«)  and  D*(a) ,  are  determined  by  the  Joint 
Implicit  equations 


D 


f2+2m*l*<P+*> 

n  Ir+s?] 


(2.13) 


and  (2.11).  Alternatively,  another  set  of  estimators  for  4  and  0  based  on  A  In  the 
absence  of  B,  £(«)  and  D(m),  say,  can  be  determined  from  the  Implicit  equations 
(2.11)  and 

_  2+2ma  A 

°  *  l+2m*  a. (m)  *  (2.14) 


Equation  (2.14)  arises  from  determlnlnc  the  constant  *  which  makes 
B(C(X(Xj -4)(Xj -4)T-D}  aj<m) )  »  0. 

Maronna  (1976)  gives  further  details  concerning  these  methods  of  construction  of 
robust  estimators  for  4  and  D.  Paulson  (1986)  has  developed  the  estimators  4(a) 
and  6(a)  from  maximising  a  generalized  version  of  the  log  likelihood. 

Estimators  for  0  based  on  8  in  the  absence  of  A  can  be  derived  in  exactly  the 
same  way.  Note  that  these  estimators  need  not  be  paired  with  an  estimator  for  4. 
First,  an  estimator  for  D  based  on  B  alone,  D  -  (a )  say,  may  be  developed  from  the 
Implicit  relationship 


alternatively,  an  eetlnator  for  0,  0+(a)  eay,  nay  be  developed  froa  the  tap  Licit 
relationship 


„  l+aa  B 

2m2  b..(a) 


(2.16) 


We  only  explicitly  use  the  estiaators  4(a), D(a),  A(a),D(a)  and  the  estiaators 
p( x ) ,  D( a )  of  section  S  in  this  paper  as  these  would  be  the  ones  that  would  be 
normally  used  in  practice. 


3.  Numerical  computation  of  the  Estimates  eased  on  Qn(4,D,a) 


The  expressions  (2.11)  and  (2.12)  are  set  out  in  accordance  with  a  fixed  point 
computational  scheme.  The  left  hand  sides  are  designated  as  the  updated  estlaates  of 
4  and  D  j  the  right  hand  sides  have  current  values  of  4  and  D  substituted  wherever 
they  appear.  For  example,  an  estimate  of  0  would  be  computed  via  D • 
k(m,n)8fAf10f  where  A|  and  Bf  are  the  fth  Iterates  of  A  and  8,  each  evaluated  at 
the  fth  iterate  of  4  and  0.  Iteration  proceeds  until  convergence  is  obtained.  A 
Newton -Raphson  scheme  has  also  been  used  in  place  of  the  fixed  point  scheme  but  it 
is  generally  inferior  to  the  fixed  point  scheme  for  p  a  3.  we  have  had  good  success 
in  using  A  *  n*1E  xj  and  o  *  n*lE  (Xj  -A)(*j  *A)T  as  initial  trial  values  of  4  and  0 
in  (2.11)  and  (2.12).  This  strategy  failed  only,  and  infrequently  at  that,  when  the 
Xj  in  question  had  the  character  of  two  or  more  clusters  of  data.  In  this  case  it  is 
clear  that  the  tentative  Gaussian  model  is  inappropriate  in  any  event.  It  is  difficult 
to  specify  the  numerical  behavior  of  the  estimation  procedure  because  this  behavior 


depends  to  such  an  extent  on  the  saaple  of  xj’s  in  question,  and  the  values  of  n,  p, 
n,  and  tha  arror  tolaranea  on  auceaaalva  valuaa  of  Qn(n,D,m) .  Tha  greater  p,  tha 
graatar  tha  nuaber  of  ltaratlona  to  convarcanca.  For  p  ■  1  and  2,  m  ■  l,  and 
convarganca  datandnad  by  auceaaalva  valuaa  of  Qn(M,D,a)  being  laaa  than  .001, 
convergence  to  a  final  aolutlon  was  attalnad  in  fewer  than  30  ltaratlona  for  ovar  ninety 
pageant  of  a  larga  nuaber  of  trial  probleae  of  varlouo  aaapia  alaaa,  10  *  n  a  120. 
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assent 

tha  parcantaga  of  iron  (Fa),  andlua  (Ha),  and  pntaaalua  (K)  obtalnad  In  a  chaadcal 
analysis  of  twenty  geological  apeclaens.  These  are  labeled  sequentially  as  A,  B, . . . , 
T.  All  two  way  aeattarplots  of  this  data  given  In  Figure  l.  Table  1  gives  aatlaatas 

of  tha  coaponents  of  6(a),  D(a)  0(A)  (sea  Section  5  for  deflntlon  of  0(A))  for 
a  ■  (l.a. ,  nawlnnn  likelihood),  aa*2,  and  A*z . 

Tha  aaslaua  likelihood  astlaatora  of  tha  variances  er f ,  and  tha  correlations  p<  j 
mem  drastically  different  froa  the  aatlaatora  ffn(a),  p<j(n),  d14(a),  0«j(a),  and 
Bn  (a),  3ij(a)  whan  aa  »  2.  Of  particular  interest  is  tha  value  of  tha  astlaatora  of 
pt>.  A  case  for  tha  reasonableness  of  each  of  these  astlaatora  of  p2J  can  be  wade 
depending  on  which  observations  one  Is  willing  to  disalsa  as  being  inappropriate  to  a 
oauaslan  nodal.  A  close  exaalnatlon  of  Figure  l  will  reinforce  this  point.  Tha 
naataua  likelihood  estlaator  Is  not  critical  of  tha  data  In  any  sense  while  6(a), 
6(a),  and  D+(n)  are  affectively  clustering  tha  data  In  tha  way  that  each  perceives 
will  retain  as  auch  as  possible  tha  working  oauaslan  nodal.  This  clustering  laplles 


eN 


that  certain  observations  are  being  "filtered  out"  of  the  estimator  of  D  by  being 


heavily  downweighted.  This  aspect  of  clustering  receives  sore  attention  in  the 
following  sections. 

This  example  highlights  the  fact  that  different  estimators  say  focus  on  different 
aspects  of  the  data  under  the  aegis  of  some  tentative  working  model.  Accordingly  we 
see  that  the  parameter  estimates  are  sensitive  functions  of  the  estimation  procedure  or 
of,  also  in  this  case,  a  parameter  of  the  estimation  procedure,  m.  Having  seen  such 
sensitivity,  one  must  conclude  that  certain  data  is  not  consistent  with  the  working 
model,  e.g. ,  perhaps  they  are  outliers,  or  that  the  working  model  of  independent, 
identically  distributed  Gauaslanlty  la  not  appropriate  for  the  setting  at  hand. 
mcidentiaUy,  the  test  of  fit  of  Gaussianlty  of  Paulson,  Roohan,  Hwang,  and  Puller 
(19S6)  rejects  this  data  as  being  Gaussian  with  p-value  <  .01. 


Table  1 

Parameter  Estimates  for  Several  values  of  a2 


re 

Na 

K 

Estimator 

w* 

Pi* 

Pi* 

P** 

D(m) 

2 

1.19 

0.166 

0.633 

-0. 770 

-0.133 

-0.413 

D(m) 

2 

0.776 

0.070 

0.229 

-0.924 

-0.509 

0.137 

*0 

D(m) 

2 

0.92S 

0.07S 

0.245 

-0.929 

-0.496 

0.139 

HUE 

• 

0.644 

0.149 

1.23 

•0.435 

0.099 

0.416 

r* 

K 
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4.  Statistical  Properties  of  p(a),  0(a) 


>2 


The  simultaneously  determined  estimators  M(m)  and  D(a)  reduce,  as  b-m,  to  ths 
usual  method  of  moments  estimators  m  *  n"AE  Xj,  6  »  n*AE(Xj  *p)(Xj  *M)r  •  The 
estimators  A(a)  and  6(a)  are  affine  invariant.  6(a)  is  positive  definite  with 
probability  one  for  aX)  whenever  n>p  although  it  may  be  algorithmically  singular.  The 
estimators  A(a)  and  6(a)  are  consistent  for  n  and  0  for  aX>  if  xx,  x2, ... ,  x„  is  a 
random  sample  from  Hp(m,D).  The  estimators  Ax( a ),  m*(«)....,  Ap(a)  are  Jointly- 
asymptotlcally  normal  for  mx)  If  x1 ,  x2, . . . ,  xn  is  a  random  sample  from  Npoi,D). 

Explicit  evaluation  of  9Qn(u,D shows  that  the  estimator  A(a)  is  an 
M -estimator  for  ji  at  the  p- variate  normal  distribution.  However,  explicit  evaluation 
of  4Qn(ji.D,a)/aD  (and  2.12)  shows  that  D(a)  is  dependent  on  pairwise  differences 
Xj-xk  so  that  6(m)  is  not  an  M -estimator  for  0  (Huber,  1981,  pp.  43-44).  Figures 
2(a)  -  (f)  provide  influence  function  contours  for  the  estimators  Ax(l. 5),  exl(l.S), 
and  o’lzd.S)  at  the  standard  bivariate  normal  distribution  for  correlation  p«0  and 
p«. 9.  m  general,  for  0<m<s  the  influence  functions  are  bounded  and  redeecendent  to 
(matrix -valued,  pxl  or  pxp)  zero  as  the  Euclidean  norm  of  the  pxi  vector  argument 
of  the  influence  function  becomes  unbounded.  Thus  both  p(m)  and  6(a)  are 
qualitatively  robust  estimators  for  n  and  0  for  a  fixed  value  of  m.  The  finite  sample 
multivariable  sensitivity  curves  (see  Barnett  and  Lewis,  1978,  p.  137;  Huber,  1981, 
pp.  15 *16 )  3C(x;m(b)  ,Np(m,D))  and  SC(x;D(a)  ,Np(p,D) )  for  A(«)  and  6(a)  at 
Mp(M.D)  are  also  bounded  and  redeecendent  to  (matrix -valued,  pxi  or  pxp)  zero  as 
the  Euclidean  norm  of  the  pxi  vector -valued  argument  x  becomes  unbounded  for  n>p 
and  CXbk».  Therefore,  the  contours  of  the  sensitivity  curves  of  the  estimators  A(a), 
6(a)  are  closed  and  bounded.  This  close dness  property  implies  (l)  that  the  pr 
of  estimation  may  be  used  as  a  clustering  algorithm,  and  (2)  that  the  process  of 
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Influence  functions  for  (a)  w  1 C m ,  o  =  C,  10)  u  ,  ^ tn  ) 
(c)  3  =  0,  (d)  on(m),  D  =  ie) 

(:)  a  (m),  -  =  .3  at  the  standard  bivariate  normal 
with  correlation  =  3,2. 


■■Mmatlnn  may  be  used  to  evaluate  the  result*  of  nm  clustering  algorithms.  The 
clustering  capability  associated  with  eetimation  of  u  and  0  allow*  for  ldantificatlan  of 
potanttal  outllara  in  multivariate  normal  data. 

Tha  asymptotic  efficiency  of  the  ]-th  component  of  £(m) ,  say  Aj<a),  ralativa  to 
tha  aaapla  mean  la  determined  to  ba 


•«<»><■»  -  [» -  ills  ] 


•%<  P+O 


(4.1) 


whara  c  *  ( l+2m* ) ' 1 .  Tha  aayaptotic  afficiancy  of  tha  1-th  coaponant  of  o(a),  say 
djj(a),  ralativa  to  tha  saapla  variance  is  ouch  aora  difficult  to  obtain  but  lengthy 
computation#  and  extensive  simulation  trials  suggest  that 


eff(ejj(m))  a 


[‘*&n 


1  •*» 


C*  1  *%(  p+* ) 
l-*-2Cj 


(4.2) 


Thaaa  computations  also  suggest  that  asymptotic  afficiancy  of  tha  off-diagonal 
components  of  6(a),  say  djk(a),  ralativa  to  tha  maximum  likelihood  estimator  of  <r]k 
is  bounded  below  by  e£f(<7j  j  (a) ) . 

5.  Modified  Squared  Error  Estimation 

Tha  method  of  section  2  is  not  directly  applicable  to  tha  structured  data  case, 
e.g.,  the  cases  of  regression  models  and  experimental  layout  models.  Another, 

and  more  extensively  applicable,  sample  characteristic  function -based  estimation 

procedure  for  Gaussian  models  is  now  developed.  If  xt,  xt .  xn  is  putatively  a 

random  aample  from  np(m,D),  then 
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K(«sp(  lUT*j  ))  ■  #<u  )  ■  «V<lUTM"%Ur&4). 


(5.1) 


MflM  tha  Jth 


in  u  H 


Rj(U)  «  «sp(  lUXj  )  -  a<u) 


(5.2) 


Hi*  mm  at  Moduli  aquirad  at  tha  raalduala  la  gluon  by 


n  ,  n 

Ku)  ■^RJ(u)*,<u)  «^|Rj(u)|* 


(5.3) 


Tha  com  at  tha  sum  of  Moduli  oquarad  of  raalduala  do— ly  parallala  tho  »»i  sum  of 

aquaraa  at  raalduala  at  laaat  aquaraa,  tha  Major  dUforanca  balng  that  b(u)  dapanda 

an  tha  nulaanca  paraaatar  u  —  wall  —  on  tha  data.  rnfonatlan  concarnlng  n  and  0 

in  prlndpla  nay  ba  axtractad  froa  L  by  nlnlalxtag  L  owar  a  auffidantly 

flxad  grid  of  non— ro  vmlumm  at  u.  Any  auch  aatlaatora  for  m  and  0 
dapand  on  tha  nuabar  at  u-ualu—  aa  wall  —  tha  location  at  tha—  u-valuaa  and  would 
not  ba  afflna  invariant.  CTtla  la  too  duaay  a  atata  of  afZaira  for  practical 
appllcattona  and  —  anothar  approach  la  nacaaaary.  XT  wa  Multiply  both  aid—  at 


Oh  Oh 

-  a  0.  —  a  Q 

On  '  OD 


(5.4) 


by  aap(  -autDu) ,  <XA<n,  and  intagrata  ovar  Rp ,  aatlaatora  at  n  and  0  will  aatlafy  tha 
tapllrlt  Matrix  valuad  aquations 

f  tt,  a^<  -xutDu)  du  ■  o,  f  ^  aap(  -autI*i)  du  »  o.  (5.5) 


Tha—  aatlaatora  JI(A)  and  0(A),  aay,  of  n  and  0  ara  dapandant  on  tha  alngla 
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{Unction  «xp('AuTDu).  Another  benefit  of  multiplying  (5.4)  by  sxp(-AurDu)  la  to 
the  resulting  estimators  affine  Invariant.  After  use  of  the  aatrlx  differentiation 
formulas  of  Dwyer  (1967) ,  the  Integrals  In  (5.5)  nay  be  explicitly  evaluated  and 
rearranged  to  provide  the  joint  implicit  estimating  equations  for  j*  and  D,  namely 


M  ■  C  Wj(A)Xj  , 
J»a 


(5.6) 


D  -  (1>2A)“‘  -4)(Xj  -M)r, 


(5.7) 


where 


rj(  A)  *  exp<  -j  Qj  ),  v.(A)  *^E^Vj(  A ) , 


Wj(A)  ■  Vj(A)/V.(A), 


.  n  fl42A 

Wj(A)  -  exp( -Wj  v^fexpf -W)  )  - 


1*2X1  %<P+*) 


Qj  *  (  Xj  )  f(  (  1+2A  )0  ) '  4(  Xj  -ji  )  . 


The  joint  estimators  of  m  and  D  determined  by  (5.6)  and  (5.7)  may  be  computed  by  a 
fixed  point  algorithm  with  m  *  n'lDtj  and  D  ■  n*  *•£(*)  -u)(*j  *M) r  applying  the 

Initial  guesses  of  JKA),  D(A).  we  have  not  found  second  order  methods  to  be 

necessary  in  computing  the  estimators. 

An  application  of  the  p -dimensional  version  of  Parsevai's  theorem  (Peller,  1966, 
Chapter  15)  shows  the  equivalence  of  equations  (5.5)  to 


WWW 


(  5 .  •  ) 


2<2lT)*  £  f  •  *A<*)}<«*>  ■  fA(*>  *  «A(*-*J  >)  <**  »  0 

J«i  JBpl  1 

and 

2(2’r)BJ£i  Ip  *  «A(*)}c«*>  •  fA<«)  •  <A<*-*,»  **  a  O  <*•»> 

X 

raspactlvaly,  whara  h1(x)  ■  hx(x)  rapraaanta  tha  convolution  of  hx(x)  with 
hx(x).  Nota  that 

f(x)  •  fx(x)  ■  l2fr(H>A)DI'*  axp( -«X-m)T((1*X)D)*1(x-m)) 
and  that 

f  x( x*Xj )  -  |2irAD|*%  a*p(  -»i(x-Xj  )T( AD)_1(x-xj ) )  . 
tha  ranalnlnf  convolution*  in  (5.8)  and  ($.9)  ara  glvan  by 

•  fx(x)  -  Cf(x)  *  l2frAlcr%  a*p( -%*T(AK)-‘x)l]|K>0  . 

and 

?  fx(x)  »  ^  {|2ir(l+A)DI^  axp(  -«i(x-m)t((1*A)D)*1(X-m)J* 

tha  aquations  (5.8)  and  (5.9)  show  that  tha  astinators  £T(  a  )  and  0(A),  nay  ba 
darlvad,  aquivalantly  but  laas  dlractly  than  m(>)  and  D(m) ,  froa  considv rations  of 
paraaatrlc  danalty  aatlaatlon. 


6.  Properties  of  j2(  A),  D(  A ) 


The  estimator*  2(A)  and  D(A)  are  well-defined  for  o<x<od.  it  is  clear  f ran 
(5.6)  that  3(A)  »  n*LDtj  as  xw.  Explicit  evaluation  of  (5. 5)  show  that  3(A)  and 

0(A)  are  M-estiaators  for  and  0  (Buber,  1981,  pp.  43-44).  The  estimators 
are  affine  invariant.  Slight  modification  of  the  arguments  of  Bryant  and  Paulson 
(1979)  show  that  2(A)  and  0(A)  are  consistent  for  u  and  0  when  the  constitute  a 

random  sample  from  Wp(m,D).  The  estimators  2*(A) .  2P(X),  ^(A), 

*!*(*)»•••»  3fn(^) .  0iP(X),...,  3fpP(X)  are  jointly  asymptotically 

normal  and  the  3j(X)  are  asymptotically  mutually  Independent  of  the  arJk( A)  if  the  Xj 
constitute  a  random  sample  from  Np(m,D). 

If  we  define  the  asymptotic  efficiency  of  3(A)  relative  to  A  as  the  ratio  of  the 
determinant  of  their  covariance  matrices,  it  can  be  tftown  that 


f  348  A  44  A *1  *+ *  > 

•“(»*))  *  (448X44X^1  <«•*> 

Similarly,  the  asymptotic  efficiency  of  the  estimator  arkK( A)  of  the  kth  diagonal 
element  of  D ,  crkK ,  relative  to  its  marl  mum  likelihood  estimator  can  be  shown  to  be 


eff(arkK(X) 


r _ i_i* 

ri±£±i 

U+2xJ 

1242AJ 

2  (142X1  IP 

648A44A* 

(142X1 

l3+2xj 

( 342A  )* 

1242X J 
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(6.2) 
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Selected  values  at  thee*  efflcience*  are  given  in  Table  2.  N«  have  not  explicitly 


calculated  the  efficiencies  at  3jk(A)  relative  to  it*  corres ponding 


likelihood 


Table  2 

Asymptotic  efficiencies  at  JXj(A)  (first  tabular  entry)  and  3P 3 j ( A )  (second  tabular 

entry)  for  selected  values  at  A  and  p 


A 


.5 

1 

2 

4 

m 

£ 

.84 

.91 

.96 

.99 

1 

1 

.78 

.87 

.94 

.98 

1 

.79 

.88 

.95 

.98 

1 

2 

.68 

.77 

.84 

.88 

.90 

.74 

.85 

.93 

.98 

1 

3 

.59 

.69 

.76 

.80 

.82 

.70 

.82 

.92 

.97 

1 

4 

.52 

.62 

.69 

.73 

.75 

.55 

.72 

.87 

.95 

1 

8 

.46 

.56 

.63 

.67 

.69 

in  the  c 

as*  p-1. 

tends 

to  unity  with 

increasing  A  but  for 

eff(9kk(A)  Is  bounded  away  froa  unity.  The  efficiency  *ff((9kk(A))  is  monotone 
Increasing  with  A.  Thus,  the  higher  the  dimensionality,  the  larger  the  value  of  A  one 
Mould  uae  If  efficiency  is  a  major  consideration  In  the  choice  at  A  toe  estimation, 
we  ahall  address  this  issue  in  detail  in  section  7. 


The  nature  of  5(A)  u  a-md  la  determined  by  an  application  of  L* Hospital ’a 
rula.  Sona  alaMntary  manipulations  ylald  th*  Implicit  relationship 

D  a  _ _  (6.3) 

fcn(p*>2)-fc  E^(Xj*M)D~L(Xj  -ii) 

that  th*  estimator  must  satisfy  asymptotically  In  a.  when  p*l  (6.3)  may  b* 
rearranged  to  gat  th*  usual  moment  estimator  for  0 ,  but  for  p*2  (6.3)  cannot  b*  so 
rearranged .  Th*  estimator  of  0  defined  by  (6.3)  does  not  seem  to  be  otherwise 
interesting. 

Th*  Influence  functions  for  JX(A)  and  0(A)  at  th*  p-varlat*  normal  are 
similar  to  those  of  A( ■)  and  6(A).  Th*  major  difference  Is  that  th*  Influence 

function  ir(x;D(A),f»p(M,D)),  CKA<®,  while  being  bounded  and  redescendent ,  is 
not  redeecendent  to  zero,  but  rather  to  a  positive  definite  matrix  constant,  as  th* 
Euclidean  norm  of  x  becomes  arbitrarily  large. 

7.  choice  of  a 

There  are  two  possible  uses  for  th*  procedures  w*  propose  here.  (We  shall 
restrict  our  attention  to  th*  A -procedure  although  there  Is  a  parallel  development  for 
th*  a  -  procedures. )  Th*  first  is  to  specify  a  single  value  of  a,  possibly  based  on 
efficiency  considerations,  and  us*  it  as  a  robust  procedure.  The  choices  a*i  or  a *2 
provide  high  efficiencies  and  good  robustness  properties.  The  second  use  Is  the  one 
for  which  th*  procedure  was  developed  and  which  has  proved  most  useful  In  practical 
applications.  Me  us*  the  procedure  to  generate  a  sensitivity  analysis.  In  a  practical 


exploratory  setting  we  first  compute  the  maxima  likelihood  estimators  and  use  these 
for  starting  values  In  the  Iterative  algorithm .  Next  we  take  a  value  of  x,  4  or  2, 
and  examine  the  behavior  of  the  estimates.  Finally,  we  would  take  a»i  or  %  and 
examine  the  behavior  of  the  estimates.  It  la  not  possible  to  completely  specify  the 
values  of  A  because  the  behavior  of  the  estimates  depends  on  the  nature  and  the 
extent  of  the  data.  The  response  surface  of  the  parameter  values  and  the  final 
weights  as  a  function  of  A  are  of  primary  interest.  In  the  process  we  determine 

estimates  and  final  weights  Vj(A)  ■  exp<  -%( 1+2A  )*A(X)  -JU  A)  )*£>(  A)**(xj  -TU  A )) ) 
associated  with  each  observation.  If  the  estimates  are  sensitive  to  this  variation  in 
A,  then  there  may  be  problems  associated  with  either  the  data  or  with  the  Gaussian 
error  model  or  both.  It  will  not  always  be  possible  to  determine  the  source  of  the 
problem.  The  particular  obeervatlon(s)  which  la  (are)  the  potential  cause  of  the 
sensitivity  are  identified  by  low  values  of  Vj(A)  vis -a -via  the  whole  set  of  these 
weights.  This  discussion  will  be  subsequently  clarified  with  an  example. 

The  derivation  which  led  to  the  estimators  given  In  (5.6) -(5.7)  did  not  require 
that  A  in  (5.6)  be  the  same  as  In  (5.7).  We  could,  for  example,  use  A«i  for  JI 

and  A *2  for  0.  Furthermore,  w*  need  not  have  restricted  ourselves  to  a  scalar 
value  of  A  in  order  to  arrive  at  (5.6)  and  (5.7).  At  the  expense  of  greater 
algorithmic  complexity  we  could  have  chosen  values  A1}  corresponding  to  each  a,  j  In 
the  covariance  matrix  0.  Because  0  Is  symmetric,  we  take  A ,  j  *  A  j , .  1st  the  pxp 
matrix  h  •  (l^A^)  and  the  pxp  matrix  M  *  (2+2At])  and  let  LxO  *  ( (1>2A1 }  )<r, } ) 
denote  the  Hadamard  product  of  L  with  D .  Then  by  arguments  similar  to  those 
employed  to  arrive  at  (5.6)  and  (5.7)  It  may  be  shown  that  the  more  general 
estimators  for  n  and  0  satisfy  the  Implicit  relations 


n  *  - : - -  (7.1) 

E  exp(  -l|(Xj-M)r(L*D)-l(XJ-M)) 


and 


LxO 


{  J 
l  J*i 


•*P<-%(Kj  -H)r(l‘XD)‘L(X)-n)) 


)‘1 


I  LxO  I  ^ 

n  755d7*  tw*°r 


L(  LxO ) 


+  E^Otj  •(*)«)  *M)T  «*P(  *«*j  -4)T(LxD)*a(X,  -4)) 


(7.2) 


The  Identity  MxD  *  frrir‘‘*‘]  *  (LxO)  is  useful  in  cooputlng  (7.2).  The 

ll*2A^  j  j 

estimators  of  arc  computed  in  a  component -wise  fashion  from  the  final  Iteration  of 
(7.2).  These  estimators  would  be  at  interest  when  it  is  desired  to  treat  different 
components  of  the  Xj  differently. 


8 .  Examples 

Example  8.1.  The  basic  data  for  this  example  are  taken  from  Anderson  (1958). 
The  first  25  points  consist  at  the  first  two  (of  four)  components  of  this  data  with  five 
additional  (outlying)  observations  appended.  We  have  chosen  2 ,  1 ,  for  this 

illustration.  Table  3  summarizes  the  weights  vj(  A  )~exp(  -i<  xj  -JI)r(  ( 1+2A  )D)*1 
(Xj-jl))  associated  with  each  point  on  the  assumption  that  the  data  follow  a 
single  multivariate  Gaussian  distribution .  Table  4  provides  the  estimates  of  the  means 
and  covariances  as  well  as  the  maximum  likelihood  estimators.  With  A  *  +  ®,  all 
weights  are  the  same.  As  A  decreases  from  +  ®,  the  weights  become  differentiated. 
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TIM  5  outlying  o 

g— Indr  of  tha 


M  arm  randarad  alatinctlva  by  thatr  dllninn  wnighto  Vj(  A) 
indicator  that  thaw  obwrvationa  trt  not  conalatant  with  thn 
Bonn  and  tha  aaaunptlon  of  a  sing  la  Oauoalan  dlatrlbutlon. 


Tabla  3 

Sanaltlvlty  of  Gbwrvatlonal  waighta 
9 1  (  a  )  ( x  ioo )  to  variation  in  A 


Table  4 


sensitivity  of  Paraaeter  Estimates  to  variation  In  A 


A 


4 

2 

1 

.5 

18S.S 

1S5.2 

104.9 

104.9 

Mi 

150.1 

150.3 

149.9 

149.  7 

14S.1 

14S.S 

155.4 

136.6 

<*11 

SO. 2 

74.1 

65.7 

52.3 

Pi.  I 

.52 

.67 

.85 

.87 

Proa  ( 5 . ■ )  or  (S.9)  we  find  that  estimation  of  ji  and  0  Is  Intimately  related  to 
the  estimation  at 

f(x)  *  fA(x)  -  l2ff(i4A)0l*%  exp[-  2  (Xj-ioTd-Vxj-M)] 

by 

Ia(«)  *  n’4  I2SAD(A)C%  exp£-  (x-Xj)r  D_1(  a )(x-xj  )]  . 


X 

The  density  f(x)  •  'x(x)  may  be  retarded  as  a  tentatively  posited  (prior) 
distribution  and  fA(x  as  Its  estimate ,  given  the  Xj’s.  If  the  xj  are  from  Np(ji,D)t 
then  2a(x)  *  iv(x),  for  all  reasonable  pairs  A,  A’  (A  *  A’,  o  <  a,  a*  <  ®.  If  we 
took  A  »  i  and  a*  *  .001,  JA(x)  and  Jx-(x)  would  not  be  similar  unless  n  were  large 
since  then  {A>(x)  ts  approaching  a  sum  of  Dirac  delta  functions.  If  the  xj  are  not 

x 

from  Np(M.D)  but  from  h(x),  say,  then  fA(x)  will  be  an  estimator  for  h(x)  * 

x 

fA(x)  and  fA(x)  can  be  quite  different  from  f(x)  *  fA(x).  we  thus  suggest 
that  the  estimators  p(A)  and  0(A)  and  **( a)  and  D(m)  are  robust  primarily 
because  of  their  Intimate  relationship  to  goodness  of  fit  through  a  parametric  density 
estimation .  Equations  ( S .  S )  and  (5.9)  Imply  the  data  Is  being  adaptively  screened  so 
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as  to  rotate  tea  character  of  Gauaaatety  aa  such  aa  poaalbla  and  that  tha 
estimation  la  really  subordinate  to  density  estimation  or  arror  reconstruction. 

figures  3a,  3b,  3c,  daplcta  what  tha  estimation  procadura  parcalvaa  aa  a 
dacraaaaa.  Por  largo  A  tha  danalty  estimate  fx(x)  la  approximately  uniform.  At  a«2 
tha  density  satlaate  contours  ara  saooth  axcapt  for  ooaa  slight  distortion  in  tha  area 
of  (198,130).  At  A«l  tha  probability  surface  la  distorted  In  tha  vicinity  a£  tha 
contaal nation  but  tha  distortion  la  not  yet  pronounced.  Coapare  tha  astlaatos  of  tha 
paraaoters  and  tha  observation  weights  Vj(A).  At  A»%  tha  distortion  has  tisrnas 
draaatlc  and  Indeed  separate  "hills-  for  the  outlying  points  have  formed.  Agate 
coapare  tha  astlaatas  and  Vj(A)  for  A«%.  Since  tha  density  estimate  perceives  tha 
outlying  observations  as  not  consistent  with  tha  reaateder  of  tha  data  and  tha  single 
auaptlon,  tha  reason  for  tha  down -weighting  of  tha  outlying  observations 
clear,  if  we  let  a*o+,  the  density  sstlastor  becomes  an  average  of  a  set 
at  Dirac  delta  functions  located  at  each  point. 

At  a«Jj,  tha  procadura  has  affectively  clustered  tha  data  with  tha  outlying 
obeeruatlons  excluded  from  tha  mate  cluster.  Accordingly,  as  a  is  varied  In  this 
example,  a  dramatic  change  In  tha  estimators  Implies  the  existence  at  clusters  at 
observations  different  from  tha  mate  cluster  and  not  consistent  with  tha  prior 
assumption  of  Independent,  Identically  distributed  Gausslanlty.  The  reconstruction  of 
the  error  density  becomes  increasingly  critical  with  decreasing  a  .  This  example 
reinforces  tha  connection  of  tha  robust  procadura  with  goodness  at  fit. 


example  >.2.  If  (y j,  *j),  J  »  1,2,...,  n  represents  a  random  sample  from 
Nx(m,D),  m  ■  (Mi,v)r,  then  the  regression  of  y  on  s  Is  given  by 


B(y!s)  ■  Mt  *  -  0O  *  0x*t 
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a 


Tabl*  s 


Ssnottlvtty  Pj(t)(*iOO)  to  variation  in  kxx,  xia,  xlt 

_ ^u) _ 


Y 

M 
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(1.2,4) 
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94 

97 

94 

17 
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12 

92 
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11 

0 

19 

121 

17 

SO 

33 

19 

20 

44 

11 

92 

14 

79 

21 

100 

10 

94 

97 

94 

Tha  p&ruMtar  aotiaatas  u«  sansitiva  function*  of  l.  TTw  points  which  ara  bo 
influanttal  or  potantlally  tnconsiatant  vis-a-vis  ths  linaar  nodal  with  a  Gaussian  arror 


structure  are  dattnlMd  from  observational  weights  Vj(l),  1.*.,  thoa  with  low  values 
of  Vj(t).  Three  points,  2,  it,  19,  tn  especially  singled  out.  Point  ia  represents 
an  extrema  point  in  tha  c -space  and  ia  most  Influential  on  tha  estimate  of  tha  slop# 
Pi  ■  ffu/®u  at  tha  ratraaalon  Una.  Point  2  has  tha  second-aoet  extreme  value  at 
z.  Point  19  produces  an  extreme  rsaidual.  Analogous  rsaults  obtain  if  aU  AtJ  *  A 
and  A  is  varied  in  order  to  determine  tha  response  at  the  parameter  salinates  and  the 
observational  weights  to  variation  in  A.  Practical  experience  shows  that  in  many,  but 
not  all,  cases,  taking  Just  a  single  value  of  A  (or  set  of  values  a,j)  provides 
sufficient  information  concerning  the  response  at  the  parameter  aetlaaiaa  and  weights 
Vj(A)  (or  Vj(D)  to  variation  in  A  (or  l)  tn  the  sense  that  if  A  (the  a,  j )  were 
ftirther  decreased,  the  same  trend  in  rsaponsa  will  be  continued,  m  those  cases  a 
robust  analysts  will  lead  to  tha  same  conclusions  as  tha  sensitivity  analysts.  Xh  some 
cases,  a  change  in  trends  wtu  be  observed  as  a  decreases  so  that  a  single  robust 
analysis  via -a -vis  maximum  likelihood  may  not  glva  tha  same  Indications  as  a  sensitivity 
analysis. 

CxasPle  • .  3 .  The  three  dimensional  data  for  this  illustration  are  taken  from 
Onanadeetkan  (1977,  pp.  SO-S2).  The  Cl  triads  of  his  example  7  wars  obtained  by 
adding  a  spherical  am  melon  noise  component  to  each  at  the  coordinates  on  the  surface 
of  a  specified  paraboloid.  Tha  two-dimensional  scatter  plots  at  these  data  are  not 
suggestive  of  the  data  In  three  dimensions  lying  near  a  curved  surface,  we  determine 
the  response  at  the  parameter  estimates  to  changes  in  A.  The  naan  vector  estimate  at 
AM  is  (-3.SC,  4.72,  2C.94)  while  at  A-%  It  is  (-3.93,  4.70,  2C.9S).  The 
variance  estimates  a  Am  art  <3.ai,  2.33,  2.94)  while  at  Am%  they  are  (4.43, 
2.7C,  3.79)1  the  correlation  estimates  at  AM  are  (-.51,  -.47,  .20)  while  at  A«% 
they  are  (-.SC,  -.50,  .27).  The  estimates  at  the  seen  are  remarkably  stable  but 
the  eMmated  variances  and  abeoluta  values  of  the  correlation  increase  with  a 


dKTMW  in  x.  Thaaa  charaeterlatlca  lnpl y  that  it  tha  oauaaian  dlatrtbutton  nodal  la 
not  appropriate ,  tha  bant  pUea  to  look  for  duncuitiea  la  at  tha  eantroid.  Ttua  la 
oonflmad  by  tha  dlatrlbutlon  at  tha  waighta  ?3(x) ,  aopaclaUy  for  x«%.  For  axanpta, 
for  x«%,  tha  largaat  thraa  waighta  Vj(X),  . •*,  .M,  .$2,  lndlcata  that  thara  a ra  no 
nbaarvattona  naar  tha  eantroid.  Thla  daflclancy  at  obaacvatlano  noar  tha  eantroid  la 
alao  datamlnahla  Cron  x«a  raaulta,  but  it  la  highlighted  at  tha  naallar  vnluaa  of  x. 
Tha  practlca  of  axanlnlng  tha  Vj(X)  la  at  altar  to  that  of  axanlnlng  quantUa-quantlla 
plota  (dnanadaalkan,  1977,  pp.  50-52)  alnca  Vj(K)  la  baaad  on  tha  quadratic  fora 

(xj-j4>))  5  x )(*j  x ) .  Tha  cone lua Iona  drawn  for  thla  axanpia  are  tha 

SBra  a 

Thla  Inforaal  obaarvailon  of  aanaltivlty  of  vartanca  and  covarlanca  aatlnatao  to 
changaa  in  x  can  ba  nada  foraal  no  aa  to  provida  probability  atatananta  eoncacntnc 
appropclatanaar  of  tha  p-variate  noraal  nodal  by  dafbdng  a  taat  at  at  baaad  an  tha 
nowgattva 

S(X)  -  tr<D"1(  X  )D)  ♦  tr< D(  X  )D*  v  ) ) -2p. 

for  x  «  i,  any.  Thla  taot  atatjattc  will  ba  naar  aaro  it  oauaatanity  la  appropriate!  it 
will  ba  larga  if  Oauaalanlty  la  not  appropriate.  Paulaon  and  Swopa  (i9«g)  hava  uaad 
a  atatlatlc  alnllar  to  S(X)  to  taot  for  p- variate  normality.  Thla  axanpla  again  thorn 
tha  lnterralationohip  at  mam a  aapacta  of  robuatnaaa  and  aanaltivlty  analyala  with  taat a 
of  flt  of  nodala. 

9.  Multivariate  Two-Way  Croon  ciaaaiflcation 

Tha  ayatan  of  aquatlona  (5.5)  ara  raadlly  axtandad  to  lncluda  aul tl variate 
ragraaalon  and  daalgn  altuatlona.  wa  Indicate  how  thla  nay  ba  dona  for  tha  caaa  of  a 


cl— Wad  dsslgn.  The  if|u»»n»  «•  olmllar  for  other  deeigne  and 


The  two-way 


cl— Wad  Model  mjt  bo  written 


«*jk|)  »  M  ♦  «j  ♦  dk 


J  ■  1,2, . . .  ,a,  k  »  1,2,. . .  ,b,  f  *  i,2,...,nj|(  a  1,  where  the  xjkg  are 


(9.1) 


be  p 


p*l  location 


with  covariance  matrix  0.  The  quantitlee  m.  a},  0*  are 


Define 


♦jk(u>  *  exp(luT(M+<*j+dk )  *  %uTDu) ) 


•  6  n,k  T 
Uu)  ■  E  E  E  ldik(u)  -  exp( luTxjk|)|*  . 

J*1  k»l  !■! 


Multiply  both  eldee  of  each  of  the  following 


at  „  «L  „  aL  „  0L 

a  •  °-  »7  *  °-  53?  *  *•  «  • 0  • 


by  exp(-AurDu)  and  Integrate  over  Rp  to  get 


(9.2) 


(9.3) 


(9.*) 


f  exp(-Aur0u)  du  *  0 

J  ®e 


(9.5) 


for  •  ■  Mt  #  dk*  0,  J  ■  1,2,...,  a,  k  »  1,2,...,  b.  Explicit  evaluation  of 
(9.9)  leada  to  the  systea  of  implicit  equations 


E  E  {  (Xjhi-M-aj-dk)  vJk|(A)  >  0, 


E  j  (Xjfcg-M-aj-dh)  Vj  k  f(  A ) 


■  0,  J  ■  1,2, . ...  a 


E  j  (Xjkf-M-«j-0k>  ^jkl(A)  -o,  k  ■  1,2,...,  b. 


The  rank  of  thle  system  at  equation!  ia  a+b-1.  The  first  at  these  equations  subsets 
the  constraints 

£  £  |«j"ji,i<X)  •  £  £  £  d*9ji,j(X)  -  0  <§.•) 

be  appended  to  produce  a  full  rank  systesu  Alone  with  (9. ft)  we  obtain  the  laplldt 


M 


C  £  {  *1X1  WA) 

£  £  j  vjk*<x> 


(9.7) 


®1 


E  E  (XJk|  -H  -  **>  *Jk|(X) 

c  {  »mi<M 


1»2( • • *  *  ®*li 


(*.•) 


0% 


E  E  (xJkI  -u  -  «j)  vJki<X> 
E  J  »Jk|(X) 


X »2 , . . . ,  b-x. 


(9.9) 


and 


D  »  ( 1— 2X ) "l 


E  E  E  (*Jkl*M  •«J-dk)(xJ|||-M-aJ*tfk)T»JKf(X) 

J  ^  " _ 


f  fl*2XlWP+’2h 

5  b“<A>  '[2+2XJ  J 


(9.10) 


tdiers 


▼jHl(X)  ■  e^( -%(X+2X)-1  (*jkl-M-Oj-dfc)T0'i(*Jh|-u-aj-0k)>.  (9. XX) 


Observations  xjk|  which  require  apeclal  consideration  are  indicated  as  in  section  • 
by  low  values  of  Vjkf(X)  vts-a-vis  the  whole  set.  A  low  value  of  Vjkf(X)  say  sea 
that  the  particular  observation  la  a  potential  outlier.  Too  sany  low  values  will  laply 


that  the 


of  a  single  Gaussian  parent  may  not  ba  warranted  or  that 


tha  aodal  la  mis-epecified  or  that  thara  a ra  indeed  a  number  at  potantlal  outliers. 
Furthermore,  If  >  1  and  we  find  that  Individual  cells  have  low  vaiuoe  Vjhj(A) 
aaaoclatad  with  than,  than  interaction  In  tha  table  la  a  distinct  poaelbllty.  Xn  this 
case  wa  generalise  tha  nodal  to 


E(Xjkf)  *  M  +  <*J  4-  0t,  +  y)k 


and  proceed  accordingly. 


Thla  nultl variate  procedure  can  ba  eapaclally  useful  for  exploratory  purposaa. 
Detemlnatlon  of  tha  aanaltlvlty  of  Vjk|(A)  and  tha  paraaatar  aatlnataa  to  chances  in 
A  will  eerve  to  uncover  potantlal  problans  with  tha  data  or  tha  wociditc  aodal 
consldarad  aa  a  coneletsnt  elngle  entity.  Tha  procedure  is  computationally  lnaxpanslva 
and  easy  to  use.  w#  have  Investigated  this  procedure  with  respect  to  teats  at 
hypotheses  concemlnc  the  location  parameters  such  as  «j  and  0*  but  tha  rtlatrlhuttnn 


theory 


to  ba  intractable. 


ala  9.1.  Tha  data  for  this  example  concerning  two-way 


classifications 


ware  taken  from  Anderson  (1959,  p.  21S)  who  gives 


additional  background 


concerning  these  data.  Tha  first  component  at  the  observation  vector  la  barley  yield 
on  a  given  plot  in  a  given  year  t  tha  second  component  la  barley  yield  on  tha  same 
plot  made  the  following  year.  Tha  treatments  are  five  varieties  at  barley  and  wa  fit 
tha  modal  (9.1)  to  this  data  by  tha  method  at  maximum  likelihood  and  by  tha  modified 


tha  modal  (9.1)  to  this  data  by  tha  method  of  maximum  likelihood  and  by  tha  modified 
Integrated  squared  error  method  for  various  values  at  A  with  the  objective  of 


performing  a  sensitivity  analysis.  The  results  of  this  analysis  are 
Tables  7  and  t.  we  have  only  given  the  results  for  a«2  since  the  ri 


and  the  final  weights  to  decr< 


of  the 


in  A  continues  the  trend 


evidenced  In  Tables  7  and  t.  Table  7  indicates  that  the  largest  change  occurred  in 


the  para— tec  a,  and  tha  covariance  aatrlx.  Tha  correlation  lncraaaad  £roa  .22  to 
.37.  The  final  welghf  V]k(2)  ara  glv—  In  Table  S.  Ob— rvatlon  (9,3)  racalv—  an 
aapertally  low  weight  while  observations  (1,3),  (3,4),  (9,4),  and,  to  a  laaaar 

actant,  (2,4)  also  racalv*  low  wait!—.  It  Is  llksly  that  tha  conponanla  at  (9,3)  ara 
Inftchingad  and  — ould  raad  (97,69)  lust* ad  of  (69,97).  Tha—  ob— rvatlon*  have 
had  tha  effect  at  reducing  tha  oorralatlon  bat—  tha  first  and  —  cond  year  yield*, 
w*  ara  am— Mm  that  low  weights  rat—  tha  suspicion  of  potantlal  outliers  or  othar 
dlffloiltl—  with  tha  data  and  tha  nodal  (oauastanlty,  additivity,  ate.),  we  are  not 
nmaallnt.  however,  that  this  procadur*  ba  uaad  aa  a  fnraal  tast  for  outllars. 
Incidentally,  tats  of  fit  can  oft—  ba  usad  —  or  in  lieu  at  fats  for  outllars. 

Additional  reduction  at  A  my  to  1.9  will  produce  a  ao— what  stronger  version  of 
basically  tha  aa—  results.  However,  whan  x  Is  dacron—  d  to  unity  the  procedure 
starts  to  datarlorate  in  tha  sen—  that  tha  stnglad>out  ob— rvations  above  receive 
weights  near  saro  and  a  few  at  the  othar  originally  low  weights  have  be—  further 
reduced.  Tha  first  component  variance  Is  dra—tlcalZy  reduced.  The  re  aeon  for  this 
la  that  tha  eodtflad  squared  error  aulttverlata  residuals  are  beco—ng  Increasingly 
separated  —  that  the  esptrtcal  density  estl— tor  of  the  lack  of  fit  distribution 
perceives  Multiple  distributions  and  so—  of  the—  are  quite  separated  froei  each 
othar. 

A  basic  advantage  at  tha  Modified  squared  error  analysis  at  variance  procedure* 
1s  the  direct  adaptive  Involve— nt  of  tha  covariance  structure  in  the  develop— nt  at 
estl— la*  at  —dal  affects »  this  direct  involve— nt  at  covarlanc*  structure  considerably 
enhanc—  our  ability  to  constructively  criticize  (tentative)  analysis  of  variance  and 
regie— lun  —dale.  w*  thus  view  tha  procadur—  and  sathoda  presented  hers  — 
coMpla— ntary  and  subordlnaf  to,  not  as  replace— nf  for,  the  aethods  at  least 
squarw  and  —tImis  likelihood. 


Tibia  f 


Barter  Yteld  la  A  aivan  roar  (lot  tabular  component  on  left) , 
and  Yteld  la  Following  Yaar  (2nd  tabular  conponant  on  left), 
and  Pinal  weight.  Bjk(2)(xi00)  (tabular  coayonent  on  right) 
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